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1. Introduction 



The Web is a very dynamic and constantly evolving phenomenon: new Web pages are 
constantly created, and old pages are frequently modified. Therefore, it is often very 
difficult to follow all the changes made at different Web sites, especially if you want to 
keep track of the changes made at numerous sites. Even if one can follow simple 
changes, such as addition and deletion of certain keywords at a site, it is much more 
difficult to follow complicated changes and relationships across multiple sites, For 
example, it may be difficult to detect the situation when several companies suddenly 
created links to each other's Web site following an introduction of a new product by one 
of them. Nevertheless, it may be important to be able to detect such changes because 
some of them contain important information about new developments in or among the 
organization (for example, once we detected that the companies started pointing at each 
other's Web sites, it may be an indication that they are forming an alliance, and itmay be 
useful to know that), 

To deal with the problem of monitoring changes at Web sites, some vendors developed 
systems that provide simple rnomtpring capabilities. For example, 



Unfortunately, Lycos . monitoring capabilities are quite simplistic and do not allow 
monitoring more complex relationships, such as the one described above. 

This invention presents a method for monitoring complex changes occurring at one or 
several the Web sites. Wc believe that it will be useful to a broad range of Web users, 
such as financial analysts, industry analysts, investors, regulatory agencies, and 
advertisers. 

In the next section we describe a monitoring and probing specification language, and in 
Section 3 we explain how this language can be implemenled. Finally, in Section 4 we 
will describe its practical embodiments. 



The general structure of a monitoring/probing rule (or a trigger - we use these terms 
interchangeably in this document) is 



2. Monitoring/Probing Specification Language 



WHEN <events> 
TP <conditions> 



THEN <probing-actioTis> 

where events, conditions, and probing-actions are briefly defined as follows (they will be 
described in detail later). 

Event: An event is a change in state of one or more objects that we are monitoring. For 
instance," an appearance or disappearance of one or more keywords, the appearance of a 
new link between two sites, a change in physical attributes of a web page are examples of 
events. 

Conditions: Conditions are constraints that apply to events and act as filters that refine 
the space of events to result in a smaller subset that are of interest to entities that monitor 
the web. For instance, the appearance of the keyword "push technology" is an event that 
is of interest only if the keyword "cable modem" does not exist anywhere on the site (the 
qualifier "only if cable modem does not exist anywhere else" is a condition that is applied 
to the event). 

Probing Action: The purpose of the probing action is to investigate what is "going on" at 
the site or a collection of sites once the events in the WHEN-clause occurred and satisfied 
the conditions in the IF-clause. For instance, the system may explore what is "going on" 
at a predefined list of web sites when the keyword "cable modem" recently appeared in 
one or more of them (if the events and condition specified above hold good). A probing 
action usually returns some information that is the result of the examination of one or 
more web sites in order to understand deeper reasons for the occurrence of events in the 
WHEN-clause and the conditions in the F-clause. 

The following examples illustrate the structure of WHEN-IF-THEN rules: 

1 . WHEN the Keyword "Cable Modem" Appears on a site 

IF the Keyword "Push Technology" Does not appear at the Site 

THEN Find out on how many sites (from a pre-defined list of sites) the word 

"Cable Modem Appeared" and of these 

how many feature the keyword "Push technology" 

2. IF (Several Web sites are linked into a Star' configuration AND the Keyword "Cable 

Modem" is found on all the nodes in the configuration) 
THEN (Find all such configurations, the members of which contain the word "Cable 
Modem"). 



' ' Star" is a user-defined predicate, intuitively, meaning thai there is a central site that is linked to all other 

sites. 



3 IF (A keyword set consisting of ("Push technologies*, "Web TV, "Channel^ 
3 ' ^ all the nodes (members) ^^^^L, rcturn the Ust of 

THEN (Find all such configurations where such keywords appea 
the corresponding URL's). 

4 IF (the ™ m b CT of vista to . site increase a. a rate *» ' ™° *" ^ 

than r; return the set of corresponding URL s). 

s IF (a keyword appears at a site AND links to that site from other sites disappear) 
THEN (find all the sites from which links to this site disappeared; 
find if these sites have any keywords in common; 
find all the sites which are now linked to this site; 
find if these sites have any keywords in common; 
return four lists) 

The WHEN- and IF-clauses form the monitoring pari of the rule (trigger), and the 
THEN-clause forms the probing part of the rule. The monitoring part of the trigger 
watches for certain events to occur that satisfy certain conditions. Once the monitor 
detects appropriate events, it does not necessarily informs the user about this for the 
reasons explained below. It tries to explore the situation in which these events occurred 
by executing certain probing actions specified in the probing part of the tngger (i.e. m the 
THEN clause). These probing actions will be described below. The structure of a rule is 
similar to the structure of triggers used in active databases. However, the nature of 
events, conditions and especially probing actions is substantially different, as will be 
explained below. 

An event in the WHEN-cIause may be atomic or composite. A composite event is the 
combination of two or more atomic events. One of the most popular ^thodsto_defi_ne, 
composite events is through the conjunction of atomic events, i.e. as El and ETa nd 
and En where El, E2, En are atomic events, such as "link between two sites appears, 
"a kevword appears within a site," or "a page (or link) was deleted." However, composite 
events can be defined in more general, terms as well (e.g. as a Conjunctive or a 
Disjunctive Normal Form (CNF or DNF) of its atomic events). Atomic events belong to 
one of the following classes (but atomic events are not limited just to these classes): 

1 . A link appears/disappears 

2. A Keyword (or a group of Keywords) appears on a page. 
; 3. A Page's text is modified. 

4. Change in the physical attributes of a page. 

5, A visitor visit a page. 

Although events in the monitoring part of the rule arc useful, it is not necessary to use 
them in . general for monitoring Web activities. The reason for that is that they can be 



- e^r^^- - eI , (as uscd in Web Watch . 

clause ua n8 the methods known to TlZon Z " the IF 

approach is adopted for expositions sil]™andl? ° 'T SkillS * lhe art This 
and rt „ essential to note that this tsoZ.l^t^ "* Creation of 3 ^ 
propose. Therefore, we will focus onl on th^l" "^.^Wn* that we 
in the rest of this section. Y str "<*"re of conditions and probing-actions 



Conditions: 



An example of an iF-clause '^!SS±^t^i!^ where Pi » an atomic condition, 
Site S, is Linked to Site S 2 AND keyword K, appears in S 2 AND Keyword exists h, 

"Keyword exists 

] ' oTht mTi it ^ Pag \ iS " C ° lleCti0n **» pictures ^ video audi ° Ejects (and 
wnf ^ h } Si" ^ ° n r2? Web and Share the same ™- Two different Web pages 

documenT C> 6tC ' W1 " ref6r t0 Web pag6S aS <Paees ' in the rest of th * 

2. Web Sites: A Web Site is a collection of one or more Web pages Pages within a site 
are mked by links. A Web Site also has a URL which uniquely identifi es ST We 
will refer to these as Sites in the rest of the document 

3. Key Words: These are text objects (words, phrases etc.) found in a web page and 
therefore, by extension found in a Web site. 

4. Links: Links are embedded in Web pages and allow a visitor to move from one 
location to another. A Link may take* user from one page to another or may simply 

, take her to another location on the same page. _ ' ' 

5. . : yisit: ; Refers to the act of a visitor (anybody who reaches a Web Site) visiting a site. 

Tn addition to these basic entities, we also use some of the composite entities that can be 
derived from the basic entities using some data definition language (for example if the 
basic entities are defined in terms of a relational data model as tables, the composite 
entities are defined as relational views, i.e., as SQL expressions). Examples of these 
composite entities are: 



I. Configuration: A collection of Web Sites that are linked to each other. A 



Configuration can have one (trivial case) or more Web Sites. For example, a set of 
sites linked in a circular fashion forms a "ring" configuration; or a configuration 
consisting of a central site that is linked to all other sites in the configuration and 
other sites being linked only to the central site, forms a "star" configuration. 
2. Reach relation: It specifies which sites can be reached from a given site, This 
relation can be defined as a transitive closure of the LINKS relation. 

Basic and derived entities are modeled as data objects in the corresponding data model, 
For example, if the data model is relational, then these entities are modeled as relations 
(predicates). If the data model is object-oriented, then they are modeled as data objects. 
Without loss of generality, we assume that the model is relational data model and that we 
model these entities as predicates (using relational terminology, we will sometimes call 
them "relations" or "tables" and assume that these terms are synonyms), For example, 
LINKS is a predicate of the form LJNKS(sitel,site2), where site! is the site from which 
the link originates and points to site2. Moreover, some of the predicates are temporal 
because they (implicitly) change over time, For example, predicate LINKS may 
implicitly change over time (because links can be added and removed over time). Note 
that this invention is not limited to the relational data model and is applicable to other 
data models as well. 

Once we defined basic entities of the data model (e.g. as predicates/relations), we specify 
the atomic conditions that can appear in the iF-clause of the monitor as follows. An 
atomic condition can either be a predicate, or its negation, or a past temporal predicate (a 
past temporal predicate is either a temporal predicate preceded by a unary past temporal 
operator or a binary temporal operator), Some examples of unary past temporal operators 
are ' Always in the_past, Sometimes Jn_the_past, 

Sometimes_within_the_past(Tlime_units), Always_within_thejiast(T_time_units), 
and an example of a binary temporal operator is Since, Some examples of atomic- 
conditions are: 

1. Key_Words(site,keyword, count) meaning that the keyword "keyword" has count 
"count" at site "site". 

2, "LINKS(sitcLsite2) Sometimes_within_the_past(2weeks)" - meaning that a link 
existed at some point between sites sitel and sitc2 within the past 2 weeks. Note that 
LINKS is a temporal predicate because it implicitly changes over time. 

3 "Key_Words(XYZ, "cable modem", count) Since LINKS(XYZ, ABC) 1 ' - meaning 
that the keyword "cable modem" appeared at cite XYZ sine the link was established 
between sites XYZ and ABC. 



Probing Actions 

When the monitoring condition of a trigger is satisfied, the user can be notified about this 
event and the monitoring results can be returned to him/her, Unfortunately, these 



examdfTr ° VenVheIm When he/she *™ *° monitor many sites For 

To address this problem, it is important to explore further what i s "soin* on" when the 
monitor Of a mgger detects the appropriate change at the site( S ), and lis is'thc purpo e of 
the probing part of the trigger (that i 5 specified in the WHEN-clause of a rule). 

The THEN clause consists of a sequence of probing operations. We consider the 
following types of probing operations (although this invention does not depend only on 
these operations, and other operations can be considered as well): 

1. Issue an alarm and return appropriate information detected by the monitoring 
component of the trigger back to the user. For example, the trigger 

IF the keyword "Cable Modem" appears at a site but the keyword "Push 

Technology" does not appear at that site 
TEEN issue an alarm and return that information to the user 

returns the names of all the sites where the keyword "Cable Modem" appears but not 
the keyword "Push Technology." 

2. Retrieve information cither from the Web site(s) or from the temporal predicates 
maintained at the monitoring site in a way to be described in Section 2.1 and check if 
it satisfies certain conditions. One way to do this is by formulating SQL or other types 
of temporal or non-temporal queries. For example, assume that in the monitoring part 
of a trigger it was detected that a certain configuration among a group of sites was 
broken when a certain keyword appeared in one or more of these sites. Then in the 
probing part we may want to find the list of all sites whose links to those sites where 
the keyword appeared were deleted after the appearance of the keyword. This probing 
action can be expressed with the following three SQL queries: 

(The first query extracts the list of. all sites that point to one particular site (e.g., 
w ww.tarqet.com ) in the previous monitoring period (before the keyword appeared)). 

SELECT [Web Links Data]. [Origin of Link], PWeb Links Data], [Target Site] 

[Web Links Data].[Monitoring Period], [Web Links~Datal Status 
FROM [Web Links Data] * 
WHERE ((([Web Links Dat a ].[Target SiteJKww.target.com") AND (([Web 
Links Data].[Momtoring Period])-"! - 1") AND (([Web Links Data] Status) ^ 



SE !wZ ilf.^rw^ 9 ^ 0 ^^ 1 ° f Lhlk] ' [Web Links ^.[Target Site]. 
J* ^ t t f y Mon,tor,n S Period]. [Web Links Data].Status 1 
FROM [Web Links Data] 

W S E n l [ Trt UnkS DataUTarget ^^^"www.target.com") AND (([Web 
Links Data].[Momtonng Penod])="T") AND (([Web Links Data].Statu S )=No)); 

S?» th f d l U ! r i' j ° in f i wo ,^ m P ora O' tables in an SQL Query and extracts the list of 
those sites that have deleted a link in the current, monitoring period) 

SELECT [TEMP; Sites with no current link].[Origin of Link] 

FROM [TEMP: Sites with no current link] INNER JOIN [TEMP:sitcs with earlier 

link] ON [TEMP: Sites with no current link].[Origin of Link] - [TEMP:sites 

with earlier link].[Origin of Link] 
GROUP BY [TEMP: Sites with no current link], [Origin of Link] 
ORDER BY [TEMP: Sites with no current link].fOrigin of Link]; 

3. Execute a program written in a general-purpose programming language (e.g. C) that 
examines certain conditions. For example, such program may visit different Web sites 
in some sequence and return the URLs of those sites that satisfy certain properties 
(e.g. have increasingly higher visitation traffic). 

4. Execute a Data Mining Query ([IVA96], [H+96],[SOMZ96],[K+94]). A data mining 
query discovers all the patterns of a certain type (specified as a set of constraints on 
the types of patterns to be discovered). It is stated in a Data Mining Query Language 
(e.g. M-SQL [IVA96], DMQL [H+96], as a meta-query [SOMZ96], or as a template 
of [K+94]) and returns a set of patterns satisfying this query. For example, a data 
mining query may want to find all the patterns (e.g. expressed as association rules 
[AIS93]) that correlate appearances and disappearances of the keywords "Cable 
Modem" and "Push Technologies" across certain specified groups of sites. The 
discovered patterns are returned to the user. The returned patterns may provide a 
better insight- into the relationship between these terms and, more generally, better 
insights into what is "going on" with these two technologies. 

To illustrate how the composition of these basic operations works, consider Example 5 
from Section 2. The THEN-clause of the trigger from Example 5 consists of a sequence 
ot 5 operations (that can be defined with operations of type 2, i.e., retrieve information 
operations that can be defined, for example, with SQL queries), 

3. Implementation Issues 

S^ciTz ^ iSSUC 18 t0 implement the Web Storing tnggers described in 

Sr nt ? i0n iS Web i$ SmeksS in the «»• that it does 
remember , te h^ury. For example, there is no way to test whether the volume of 



vwrtK at the .^te has been monotonically increasing within the past 3 months unless the 
user maintains the information about the volume of visits over time him/herself (the Web 
site do CS not maintain this information itself). Therefore, it is the responsibility of the 
user to maintain the necessary information. Of course, the main question is what 
historical information needs to be maintained and where for the monitoring purposes. 

3.1 Information to be Maintained at the Monitoring Site 

?oizx£ tri f r we need to maintain the (tempora ° predicaies that ^ ai 'ow us 

P e * pr f s,ons contair " ed i« t^se triggers . This means that we need to 
store t e current values of these predicates and, in certain cases, their past histories Fo 
example, assume that wc have temporal predicate "LINKSfsitel site*) 

w!^ni a " t(2 ro° nths) ? P ^? n8 . that a Hnk existed between sitel and site2 at some time 
within the past 2 months. To be able to monitor and evaluate such a predicate we need to 
maintain the cun-ent state of predicate LINKS (at the present time) as well as taJa^T 
history w.thm the last two months. Since time is continuous, and we can evaluate the 
presence of links between sitel and site2 only at some finite set of times, we need to 
specify the sampling (monitoring) rate for the evaluation of predicate LINKS and check 
fte presence of links at the time points corresponding to this sampling rate. For example 
we may decide to check the status of predicate LINKS every day, or every 8 hours or ' 
every hour. Then we assume that no abnormal changes happen to predicate LINKS 
between the sampling points. For example, assume that we sample predicate LINKS 
every day. Then if the link between sitel and site2 existed yesterday and still exists 
today this means that nobody deleted and inserted the same link within the last day (but 
it is of course possjble to delete the link during that day; in this case, the monitor detects 
the change at the next sampling point). Similarly, if the temporal predicate is 
always jast LINKS(sitel, ahe2) » stating that there was always a link between sitel and 
a T«^ic " We may want t0 maimain two P re dicates LINKS(sitel , site2) and 

^1 T - LINKS(site1, site2) in order to monit0r condition "alwayspast 

LINKS(sitel, site2)" 

IS ^fiifT- in£bnna,i ™ about *" Tories of temporal predicate appearing in 
rt can be stored centrally at the same site for various users (e g at the «e of the 

2 T m ° ni,0 ™ S SOftW8re) ' ™ S ^™ i/not^ricted to a 
specific storage method and encompasses both methods. Moreover when we want to 
mem™ he s, e where these past histories of predicates are stored, VwuWa Z L 
2 Si MS" ** * * «* * "or tte'e^ 

a" the tigers. lf*e predicts 



temporal operator (e.g. Ahvaysj)ast, Sometimes _past, etc.) we store an auxiliary copy of 
the predicate pertaining to this operator, For example, for the temporal expression 
AJways_past LINKS(sitel ., site2) appearing in one of the triggers, we siore the current 
copy of LINKS(sitel, sitc2) and an auxiliary predicate Always_Past LlNKS(sitel, site2j. 
Always_Past_LJNKS(sitel, site2) is true if and only if there was always a link from sitel 
to site2 in the past. Similarly, we store an auxiliary predicate Sometimes_Past_,LrNKS for 
the temporal expression "Sometime.<s_Past LINKS' 1 . In general, we maintain one auxiliary 
operator for eveiy (predicate, temporal operator) parr appearing in one of the monitoring 
triggers created by the user, These auxiliary predicates can be maintained using the 
methods known to the person having an ordinary skill in the art (e.g. as described in 
[Chom92] f. For example, predicate Always_Past_LTNKS is updated every monitoring 
period by removing records (sitel , site2) from it that do not appear in the current copy of 
LINKS(sitcl, site2). Note that these auxiliary predicates are non-temporal (they only 
simulate temporal predicates but do not have any temporal component as part of their 
structure) and can be stored using the conventional methods known to someone with an 
ordinary skill in the art. 

One of the steps in predicate maintenance is the process of obtaining the current copy of 
the predicate. This means visiting the monitored site(s) and evaluating the present 
conditions of the predicate(s) at that site. One of the central issues in this process is what 
information should be checked and, even more importantly, what information should be 
brought from the monitored site(s) back to the monitoring site. This is a critical issue 
because, we want to monitor many sites and need to minimize the data to be transferred 
back, to the monitoring site in order not to clog the communication lines. We will address 
this issue in the next section. 

3.2 Efficient Methods of Transferring Data to the Monitoring Site 

As was'demonstrated in Section 3.1, for the monitoring purposes, it is necessary for the 
system to bring back at each monitoring period only the current states of the predicates 
that are maintained at the monitoring site, such as LINKS, Web_Sites, Key_Words, etc. 
Given this data, the past temporal predicates can be evaluated using the methods 
described in Section 3.1 (e.g. maintaining LINKS and Always_Past_Links predicates). 

To obtain the current copies of these predicates, the agents should be created that visit the 
sites being monitored and retrieve the corresponding predicates. For example, if we 
monitor LINKS(sitel ,site2), then an agent should go to, site sitel and determine if there is 
a link from that site to site2. Such agents are created at the time of system development 
using the methods known to a person having the ordinary skill in the art and are *not* 
written by the user at the time monitors are specified by him/her. 

An additional optimization issue arises when one can bring less data to the monitoring 
site than in the current copy of the predicate being monitored. For example, assume that 
the temporal predicate being monitored in the IF -clause of a trigger is "Always_Past 
LtNKS(sitel, S ite2)". As was explained in Section 3. 1, we can monitor this predicate bv 
creating an auxiliary predicate AJvvays_Past_LINKS(sitc1,site2). Note that this predicate 



Aiwa"!' pa« t™Z P t r di ° ate Sre '^"'^y leasing (such as 

pS^s trh:':rr h r p f i ~^ta*. rig w 

Web sites being monitored. H * nW ' P red,cates »e retrieved from the 

rs .^JtSJEtow to Build the Probins Component 
QUESTION: DO WE NEED THIS SECTION 

well-known ami can be implemented by a person having an ordinary skill fn the art 
Data mining query tangnages. There are several existing data minins query lan.aaes as 

4. Practical Embodiments 

We believe thai there are two reasons for monitoring a- Web site. The first is the obvious 
direct benefit to be derived from acquiring the information that may be contained in one 
or more web sites. To have access to large and distributed information repositories on the 
web, it is necessary to know what information resides at a site. Most commercially 
available search engines, like AltaVista and Yahoo provide this service. There is a second 
reason for monitoring the Web which has to do with the significance that users may 
attach to events that occur on the Web or Web Event,. A Web Event is some change in 



state 



of a web site (or sites) that may have causal implications to users. This when re- 
stated means that users may find the actual contents of the sites and the navigational 
patterns associated with them (links to other sites) less significant than the change m 
contents or the appearance 01 disappearance of links. We. refer to information about such 
changes as mcta information. To understand how meta information is of very great value 
to variety of business entities, Investors, Venture Capitalists, Consulting firms, economic 
and financial research agencies, investment bankers and advertisers to name some, 
consider the following example. Let us a consider a microprocessor making firm (say, 
Alpha Corp.) that is in the race for putting more transistors on a chip by using a process 
called Photolithography. This firm is a part of an alliance that has decided that 
Photolithography and associated technologies will be the commercially successful route 
to take in the future. This firm's web site is connected (linked) to the web site of the firms 
that are its alliance partners in this business. A competing technology using high energy 
X-rays is used by another consortium of companies (say Beta Corp.) whose web sites are 
also linked. Analysts and investors in the microprocessor industry, watch developments 
closely in the field and wish to be informed of the alliances between companies that offer 
different technologies. If a new firm or a group of firms start offering a technology based 
on photolithography that complements Alpha Corp.'s products, this information is of 
considerable significance to investors and analysts. They could use our system, 
WebWatch to monitor the appearance of Keywords "photolithography" on the Web, 
Further, they could define the triggers in such a way that WebWatch could identify the 
entry of these new firms into the field and probe further to see if they arc connected to 
Alpha Corp.'s alliance or if these firms have created a competing alliance (by forming an 
inter-linked configuration of sites). Suppose, links to one or more of the firms from Alpha 
corp.'s web configuration were to disappear from the web sites of other firms in the 
configuration, WebWatch could immediately recognize this as an event and probe for 
more details to determine possible causes for the removal of these links. The system's 
probe could identify the following causes: 

(i) The firms offer a competing technology or are aligned with a competitor: 
WebWatch could look for keywords that indicate that these firms are now 
offering competing products or it could look for links from the sites of Beta 



Corp.'s alliance to the "iff- ^ fP,„.„ ^ 

,ite, ox these firms to determine if thev have ioiner) „ 
competing alliance. " ,oined 2 

00 Those firms have formed , amance Qf [he . . ^ 

2: :::::: perhaps a * w -° ,h " — > - - - ^ 

- s fmdmgs through Cectronic mail . Tne kind of information ^ Mn be 

*. JJ— probe wouid indude. entry of new ^ or consort . ums of fim]s formM J 
of aUtances, new prodllcts or ^ be|ng offerK|i s . =s ^ ^ 

SOT,iCe5 *« •« —*» * A regu^ry agency could 

use tins mfbrmmion l0 see ifth ere is need to invade antimist issues in formMion rf 
anticompetitive alliances. An advert could monitor some sites , 0 see if there is a 
sudden spun in the hits to that site after a keyword ha, been introduced on the site He 
could also f, nd out if the spun fa visits t0 the site happe „ ed ^ ^ Qf ^ 

from a particular site or if i, is correlated to with some events. This information would 
help the advertiser asses the worth of an advertisement on the she. Analysts and venture 
cap,tal,sts who are monitoring the microprocessor industry and watching for four key 
leadmg technologies, (say, High energy X-ray, Photolithography, extreme ultraviolet 
and Metal Pairing) could effectively monitor the entry of new firms, the positioning of 
competmg products, the complimentary or competing services offered by firms the 
number of visits that these firms and their alliances generate etc. Economic and financial 
research agencies could use the WebWatch to determine changes i„ pri ci„ g and 
promotion policiesjas indicated by keywords on sites), the attention generated by new 
developments (hits to a site, macro trends in the industry (more firms offer High energy 
X rays rather than Metal Pairing) and changes in strategies. 

The coupling of n, 0ni ,or,n g »» probing offers ^ impmm ^ ^ ^ 

prevention of information overload. WebWatch does not immediately alert the user as 
soon as a minor change in the domain if his interest takes place. The system continues , D 
probe to see if this change is accompanied by events of significance (pre-defined by the 
user, and returns a the lis, of those changes. Thus, i, avoids a Huge information over,„ a d 



that makes most simple monitoring mechanisms less than useful. The second advantage 
is that it allows the user to define a relationship between certain events and their possible 
implications. The occurrence of an event (deletion of a link to a site) may be 
significantly correlated with one phenomenon (addition of a link from different site) but 
may be entirely meaningless in the context of other phenomena (say, spurt in the number' 
of visits to the site). Web Watch allows the user to define triggers that make an explicit 
connection between what is symptomatic event and what needs be further investigated. 
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General description section 



don't wanl;. to monitor 



1. Overview of the main idea 
the idea of monitoring chanqes to Web sites; 
overview 

what will be described in the rest of the document. 

2. Monitoring specification language 
a. monitoring component — the IF clause (say that \ 

events or activities — no WHEN and WHILE clauses- explain "why "we*" 
decided not to d.o this) r 

to. probing component — 4 parts: data mining queries, SOL queries, event 
state changes, return discovered info (or notification) back to the user 

3. Implementation 

a. Mainitaininq history — explain why important (statelessness of Internet); 
Problem: given the set of monitors, what needs to b© stored in 
histories, how to store this information (what are the data structures) and 
how often this information needs to be brought back 

-u ?l?i : f on 4 Zat i? n issues: how to bring back a *mininmal* amount of data 
that still satisfies our monitoring needs. Also, *what* needs to be brought 
back to our site. ^ " 

c. How this information can be brought back to the site. Launch an agpnt 
to do this. With what information this agent should be provided before' it 
goes to the site? How this information should be supplied to the agent? 
What are the algorithms'? 

_ d. How to implement the probing component? What are the mechanism* to 
implement the probing? 



